Topic detection with recursive consensus clustering and semantic enrichment
نویسندگان
چکیده
Abstract Extracting meaningful information from short texts like tweets has proved to be a challenging task. Literature on topic detection focuses mostly methods that try guess the plausible words describe topics whose number been decided in advance. Topics change according initial setup of algorithms and show consistent instability with moving one another one. In this paper we propose an iterative procedure for searches most stable solutions terms describing topic. We use based clustering consensus matrix, traditional detection, find both set optimal topics. observe however several cases does not converge unique value but oscillates. further enhance methodology using semantic enrichment via Word Embedding aim reducing noise improving separation. foresee application techniques automatic discovery noisy channels such as Twitter or social media.
منابع مشابه
Text clustering for topic detection
The world wide web represents vast stores of information. However, the sheer amount of such information makes it practically impossible for any human user to be aware of much of it. Therefore, it would be very helpful to have a system that automatically discovers relevant, yet previously unknown information, and reports it to users in human-readable form. As the first attempt to accomplish such...
متن کاملImproving semantic topic clustering for search queries with word co-occurrence and bigraph co-clustering
Uncovering common themes from a large number of unorganized search queries is a primary step to mine insights about aggregated user interests. Common topic modeling techniques for document modeling often face sparsity problems with search query data as these are much shorter than documents. We present two novel techniques that can discover semantically meaningful topics in search queries: i) wo...
متن کاملEntropy-based Consensus for Distributed Data Clustering
The increasingly larger scale of available data and the more restrictive concerns on their privacy are some of the challenging aspects of data mining today. In this paper, Entropy-based Consensus on Cluster Centers (EC3) is introduced for clustering in distributed systems with a consideration for confidentiality of data; i.e. it is the negotiations among local cluster centers that are used in t...
متن کاملahp algorithm and un-supervised clustering in auto insurance fraud detection
this thesis is a study on insurance fraud in iran automobile insurance industry and explores the usage of expert linkage between un-supervised clustering and analytical hierarchy process(ahp), and renders the findings from applying these algorithms for automobile insurance claim fraud detection. the expert linkage determination objective function plan provides us with a way to determine whi...
15 صفحه اولConsensus Clustering + Meta Clustering = Multiple Consensus Clustering
Consensus clustering and meta clustering are two important extensions of the classical clustering problem. Given a set of input clusterings of a given dataset, consensus clustering aims to find a single final clustering which is a better fit in some sense than the existing clusterings, and meta clustering aims to group similar input clusterings together so that users only need to examine a smal...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Humanities & social sciences communications
سال: 2023
ISSN: ['2662-9992']
DOI: https://doi.org/10.1057/s41599-023-01711-0